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1.  Introduction. 


A  finite-state  Markov  chain  is  a  stochastic  process  in  which  the  variable  takes  on  one  of  a  finite 
number  of  values.  Although  the  values  may  be  numerical,  they  need  not  be;  they  may  be  simply 
states  or  categories.  If  they  are  given  numerical  values,  the  values  are  not  necessarily  the  first  so 
many  integers,  as  in  the  number  of  customers  waiting  in  a  queue.  When  the  chain  is  described  in 
terms  of  states,  it  is  convenient  for  many  purposes  to  treat  the  chain  as  a  vector-valued  process.  The 
vector  has  1  in  the  position  corresponding  to  the  given  state  and  0  in  the  other  position.  Then  the 
vector-valued  process  is  first-order  autoregressive  in  the  wide  sense  when  the  Markov  chain  is  first- 
order.  Anderson  (1979a),  (1979b),  (1980)  pointed  out  analogies  between  Gaussian  autoregressive 
processes  and  Markov  chains  in  terms  of  moments,  sufficient  statistics,  tests  of  hypotheses,  etc. 

In  this  paper  the  consequences  of  the  autoregressive  structure  of  the  vector-valued  process  are 
developed  further  to  yield  various  second-order  moments  and  the  spectral  density  of  the  process. 
It  is  shown  that  using  the  mean  of  this  process  as  an  estimator  of  the  stationary  probabilities 
is  asymptotically  equivalent  to  the  maximum  likelihood  estimator  and  is  asymptotically  efficient 
(Section  4).  The  numerical-valued  Markov  chain  is  considered  as  a  linear  function  of  the  vector¬ 
valued  process,  and  a  simple  condition  is  obtained  for  it  to  be  a  wide-sense  first-order  autoregressive 
process  (Section  5). 


2.  A  Stationary  Markov  Chain. 

A  stationary  Markov  chain  {x*}  with  discrete  time  parameter  and  states  l,...,m  is  defined 
by  the  transition  probabilities 

(1)  Pr{xt  =  j\xt-\  -  i,x  (_2  =  k, . . .}  =  pij,  i,j,k,...=  1 . m,  /  =  ...,  -1, 0, 1, ... , 

where  pi3  >  0  and  Y^=i  Pij  ~  1-  Let  Pr{xt  =  *}  =  P»,  i  =  l,...,m  (p,  >  0  and  p,  =  1). 

Then 

m 

(2)  '%2PiPxj=Pj,  j  —  1,  .  •  •  ,TO. 

t  =  l 

If  P  =  (ptJ)  and  p  =  (p,),  a  column  vector,  then  the  above  properties  can  be  written  as 

(3)  p' P  =  p\  Pe  -  e,  p'e  =  1, 

where  e  =  (1, 1, . . .,  1)'.  Let  e*  be  the  m-component  vector  with  1  in  the  i-th  position  and  0’s 
elsewhere,  and  let  {zt}  be  a  sequence  of  m-component  random  vectors.  Then  the  Markov  chain 
can  be  written 

(4)  Pr{z*  =  £ j\zt  —  i  =  £i,zt- 2  =  £*,,...}  =  p^,  =  l,...,m,  /,  =  ...,  -1,0. 1 . 
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Note  that  zt  has  1  in  one  position  and  0  in  the  others;  hence  e' zt  =  1. 


3.  Second-order  Moments. 
It  follows  from  the  model  that 


(5)  £(zjt\zt-i  =  £i,  zt-2  =  £*,..•)  =  Pij , 
where  Zjt  is  the  j-th  component  of  zt.  We  can  write  (5)  in  vectoi  form  as 

(6)  £{zt\zt-x,Zt-2,...)  -  P'zt-i. 

Let 

(7)  vt  =  zt  -  P'zt-i 
be  the  t- th  disturbance.  Then 

(8)  £vt  =  £{£[(zt  ~  P'zt-i)\zt-i,zt-2,-- •]} 

=  0. 

In  (8)  the  outer  expectation  is  with  respect  to  zt-x ,  *t-2,  •  ■  ■  ■  Similarly 

(9)  £x uz't-s  =  £[£(vt\zt-x,  zt-2, . .  .)zt~s\ 

=  0,  s  =  1,2,...  . 


Since  t>t_,  =  zt~,  - 

(10)  £vtv't_s  =  0,  s  =  1,2,.... 

Thus  {■»<}  is  a  sequence  of  uncorrelated  random  vectors. 

We  can  iterate  zt  =  P'zt-i  +  vt  to  obtain 


(11)  Zf  =  Vt  +  P'vt-x  + - 1-  ( P'Y  +  (P  Yzt-s- 

Then 

(12)  £(zt\zt-„zt-a-i,- .  •)  =  ( P'yzt-a . 

This  conditional  expected  value  could  alternatively  be  obtained  from  the  fact  that  the  transition 
probabilities  from  xt-a  to  xt  are  the  elements  of  P’. 


2 


Since  {x(}  is  stationary,  {zt}  is  stationary  and  z(  has  a  marginal  multinomial  distribution  with 
probabilities  Pi , . .  • ,  pm ■  Hence,  the  expectation  of  zt  is 

(13)  Szt  =  p , 
and  the  covariance  matrix  is 

(14)  Var(zt)  =  Dp  -  pp1  =  V, 

say,  where  Dp  is  a  diagonal  matrix  with  i-th  diagonal  element  pt.  From  (11)  we  also  find 

(15)  £ztz[_s  =  (P'yeztz't  =  (P'YDp,  s  =  0,1,.... 

Since  £zt  =  £zt =  p,  and  ( P')3p  =  p,  the  covariance  matrix  between  zt  and  zt~a  is 

(16)  Cov(*t,*;_4)  =  (P')'V,  5  =  0,1,.... 

Thus  (14)  and  (16)  determine  the  second-order  moments  of  {z<}. 

The  conditional  covariance  matrix  of  zt  and  vt  is 

(17)  Var(zt|zt_i  =  e<,zt_2,...)  =  Var(t)(|zt_!  =  e,-,  z<_2, . . .) 

=  DPt  -  piP ' 

=  vu 

say,  where  Dpt  is  a  diagonal  matrix  with  pij  as  the  j  -th  diagonal  element  and 


From  this  conditional  variance  of  vt  (which  has  conditional  mean  value  0)  we  find  the  (marginal) 
covariance  matrix  of  vt  as 

m 

(19)  £vtv\  =  ^PiVi 

1  =  1 
m 

=  -PiPi) 

i=  1 

P 

=  Dp  -  ^2  PiPiP'i • 

i=l 

by  (2).  Note  that 

(20)  P'VP  =  P'DpP  -  P'pp'P 

v 

=  'jTpiPiPi  -  pp'- 

«= i 
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Thus 

m 

(21)  V  =  P'VP  +  £p.V,. 

1=1 

The  second-order  moments  of  {zt}  are  the  second-order  moments  of  a  first-order  autoregressive 
process  with  coefficient  matrix  P'  and  disturbance  covariance  matrix  P«VJ.  (See  Anderson 
(1971),  Sections  5.2  and  5.3,  for  example.)  However,  the  conditional  covariance  matrix  of  vt  given 
Zt-\  depends  on  zt_ This  fact  shows  that  vt  and  z*_i  are  dependent,  though  uncorrelated. 

Let  Ai  =  l,A2,...,Am  the  characteristic  roots  of  P,  t\  —  e,t2,. . .,tm  be  the  corresponding 
right-sided  (column)  characteristic  vectors,  and  w[  =  p\  w'2  . .  ■ ,  w'm  be  the  corresponding  left¬ 
sided  (row)  characteristic  vectors.  It  is  assumed  that  there  are  m  linearly  independent  right-  and 
left-sided  vectors  (that  is,  that  the  elementary  divisors  of  P  are  simple).  Let 


(22) 

A  = 

/l  0  ... 

0  A2  . . . 

°  ^ 
0 

=  0) 
\0  A2)' 

\0  0  ... 

A  m) 

(23) 

Q  =  (P,92, 

■  •  1 9m) 

=  (p,Q2), 

(24) 

T  =  (e,t2,. 

•• ,tm ) 

=  (e,r2). 

Normalize  the  vectors  so  T'Q  =  /;  that  is,  T'  =  Q  1  and  Q'  =  T  1 .  Then 

(25)  P  =  TAQ'  =  TAT~\ 

(26)  P*  =  TA’Q'  =  TAaT~l. 

Since  e'V  =  e'(Dp  -  pp')  =  0, 

(27)  (P'yV  =  QA'T'V 

=  (Pi Q2)  (q  ®.)  (tJwp-pp') 

=  Q2A2T2DP 

=  (P'-pe'YDp,  s  =  1,2,.... 

We  shall  assume  that  the  chain  is  irreducible  and  aperiodic.  Then  |A,|  <  1,  i  =  2, . .  .,m.  As  s 
increases,  the  covariance  function  decreases  as  a  linear  combination  of  A£, . . AJ„. 
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We  can  write  zt  in  the  moving  average  representation 


(28) 


s=0 


Since  (P')4®*-.,  =  Q2A\T{vt-a,  5  =  1»2, (28)  converges.  The  representation  (28)  is  trivial, 
however,  because 

(29)  ( P')3vt _s  =  (P')4(*t-8  -  P'zt _,_,) 

=  (P'Yzt_,-(P'Y+lzt.(s+l). 


The  spectral  density  of  { zt }  for  A  ^  0  is  (Hannan  (1970),  p.  67,  for  example) 

(30)  (/-PVv)-1V(/-Pe-iAr1 

=  [Q(I  -  AeiX)T']~'v[T(I  -  Ae-iX)Q'}~' 

=  (T')-1( J  -  Ae'x)-xQ-'V(Q')-l{I  -  Ae~iX)-xT’x 
=  Q(I  -  Ae,xrxT'VT(I  -  Ae~iX)~lQ' 

=  Q2(T  -  A2e'x)-xTlDpT2(I  -  A2e~iX)-lQ'2. 

Since  |A;|  <1,  i  =  2,. .  .,m,  /  -  and  7  -  A2e-lX  are  nonsingular  for  all  real  A. 


4.  Estimation  of  the  Stationary  Probabilities, 

Consider  a  sequence  of  observations  on  the  chain,  Xx, . .  .,Xn-  These  define  a  sequence  zx, . . zjv 
from  the  process  {zt}.  Let 


(31) 


*  =  E 


zt. 


Then  £S  =  Np.  The  covariance  matrix  of  S  is 
(32)  £{S  -  Np)(S  -  Np)' 

N 

=  Y  e(zt  -  p)(z«  -  p)' 

t,a=  1 

N  t-1  N  *-l  N 

=  YY  e(zt  ~  pXz*  _p),+EE  £(<zt  ~  p)(z*  -  py +  Y € (zt  -  p)(z<  - pV 

t=l  8=1  8=1  (=1  1=1 

Ar  t-1 

=  YY  l£(Zt  ~  p)(z<-r  ~  p)'  +  £(Zt-r  -  p)(zt  -  p)']  +  NV 

t=l  r=l 

N  t-1 

=  YY  +  vpr 1 +  NV 

t=l  r=l 
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,Y  t  - 1 

=  £  £  I11'=-1’7'’dp  +  CpT,^;w.;] 

t  =  l  r=l 

K 

=  Z  [W2(/  -  ^2)_1A2(/  -  A'f'WDp 

t=  1 

+  DpT2(/-^-1)yl2(/-.l2)-1Vr2'  +  ArV 

=  N  [W2(I  -  A2)~x  A2T2Dp  +  DpT2A2(I  -  A2)~'W2} 

-  W2{I  -  A2)~2A2(I  -  A?)TlDp  -  DpT2(I  -  A?)A2(I  -  A2)~lW^} 

+  AT. 

Then  the  covariance  matrix  of 

(33)  y/Nz  =  -4=5 

y/N 

is 

(34)  W2(I  -  A^-'T^Dp  +  DpT2(I  -  A2)~l  W2'  -  +  pp' 

[W2(J  -  A2)~2vl2(/-  A?)TlDp  +  DpT2(I  -  A,")A2(I  -  A2)~2TV']. 

Theorem. 

(35)  y/N(z  -  p)  —  A'  [0,  W2(I  -  A2)~lT^Dp  +  DpT2(I  -  A2)~lW±  -  Dp  +  pp'] . 

The  sum  5  is  the  vector  of  frequencies  of  the  states  1  being  observed,  and  z  is  the 

vector  of  relative  frequencies.  Since  £z  =  p,  the  vector  z  is  an  estimator  of  p.  Grenander  (1954), 
Rosenblatt  ( 1956),  and  Grenander  and  Rosenblatt  (1957)  showed  that  in  the  case  of  a  scalar  process 
with  expected  value  a  linear  function  of  exogenous  series  and  a  stationary  covariance  function  the 
least  squares  estimator  of  this  linear  function  is  asymptotically  efficient  among  all  linear  estimators. 
(See  also  Anderson  (1971),  Section  10.2.)  The  result  holds  as  well  for  vector  processes.  In  particular, 
the  mean  of  a  set  of  observations  is  an  asymptotically  efficient  linear  estimator  of  the  constant 
mean  of  a  wide-sense  stationary  process,  as  is  the  case  here. 

An  alternative  method  of  estimating  p  from  the  data  *j,  .  .,z/v  is  to  estimate  P  and  find  its 
left-sided  characteristic  vector  corresponding  to  the  characteristic  root  1.  If  zj  is  given  (that  is, 
fixed),  the  maximum  likelihood  estimator  of  P  is 
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(See  Anderson  and  Goodman  (1957).)  Note  that 


(37, 


A’ 

Z 

t=  2 


/  ^  A’  —  ] 
/£<= i 


Zt-lZt-l  = 


0 

n.V-1 


EO  -  1 

t=l  22t 


V 


0 


0 


0 

0 

AT-1 


•••  —  / 


is  diagonal.  In  fact,  the  diagonal  elements  of  (37)  are  the  components  of  S  -  z.v-  If  there  is 
no  absorbing  state,  every  component  of  p  is  positive  and  the  probability  that  (37)  is  nonsingular 
approaches  1  as  A  — -  oc  and  P  is  well-defined.  Since  P  is  a  consistent  estimator  of  P,  as  A  —*  oc 
the  probability  approaches  1  that  e'p  =  1  and 


(38) 

have  a  unique  solution  for  p. 
We  observe  that 

(39) 


pP  =  i> 


(■ 2  ^y2A')  P  -  ^2  Zt~'Zt 


t= 2 

N 


t= 2 

1 


=  C‘-NZ,)'- 

From  (38)  and  (39)  we  obtain 

(40)  A Ta-Hz'NP  -  *i)  =  Ar“(*  ~  P)'(F  -  I) 

=  Na(z  -  p)'T2(A2  -  I)Q2, 

where 

<41»  r  =  X)©- 

Since  the  left-hand  side  of  (40)  approaches  0  as  N  -*  oo,  we  deduce  that 
(42)  7V°(z  -  p) 0. 

The  two  estimators  of  p  are  asymptotically  equivalent.  See,  also,  Henry  (1970). 


In  many  situations  a  sample  (or  equivalently,  zi,...,zjv)  is  not  drawn  from  a 

stationary  process,  but  from  a  process  starting  with  some  given  state.  In  that  case  one  would 
might  discard  enough  initial  observations  to  ensure  that  over  the  remaining  period  of  observation 
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the  process  is  stationary  or  almost  stationary.  Thus  the  use  of  z  calculated  from  the  remaining 
observations  as  an  estimator  of  p  would  waste  the  initial  observations.  (If  one  wanted  to  run  a 
simulation  and  insisted  on  independent  observations,  one  would  start  the  process  with  a  possibly 
random  state,  let  it  run  until  stationarity  is  achieved,  and  make  one  observation.  Then  one  would 
repeat  the  procedure.  Clearly  this  is  expensive  in  computational  resources  pud  is  unnecessary.) 

To  estimate  P  and  then  p  does  not  require  stationarity;  one  can  start  from  any  state.  The 
estimator  of  P  is  a  maximum  likelihood  estimator,  and  hence  the  estimator  of  p  (and  yl2,  W2,  and 
T2)  is  maximum  likelihood  and  asymptotically  efficient.  (Several  different  sequences  from  the  same 
chain  can  be  aggregated:  see  Anderson  and  Goodman  (1957).) 

The  investigator  will  typically  want  to  estimate  the  asymptotic  covariance  matrix  of  the  esti¬ 
mator,  given  in  (35),  which  involves 

m 

(43)  DpTo(I  -  A2y'Q'2  =  DpJ>(l-  A;)"1*; 

7=2 
m  og 

=  -°pEE  yM- 

j= 2  s=0 

The  rate  of  convergence  is  governed  by  the  root  that  is  largest  in  absolute  value.  The  asymptotic 
covariance  matrix  can  also  be  written  as 

(44)  Dp  f  ](P'  -  ep')3  +  [Dp  f^(P'  -  ep')3}'  -  Dp  +  pp' . 

5=0  5=0 

Since 

(45)  (P  -  ep'y  =  P*  -  ep',  s  =  0,1,..., 
the  covariance  matrix  (44)  is 

OO  OO 

(46)  Dp  Y,  P3  +  Dp(  £  P3)'  -Dp-  pp'. 

5=0  5=0 

The  infinite  sum  can  be  approximated  by  a  finite  sum  since  the  sum  is  convergent.  For  an  estimator 
P  and  p  are  replaced  by  P  and  p,  respectively. 

A  possible  computational  method  is  to  power  P  until  the  rows  of  P 3  are  similar  enough,  that 
is,  until  P3  is  similar  to  ep' .  Then  (46)  follows.  Note  that  at  each  step  only  P 3  and  nee^ 

be  held  in  memory. 

As  a  way  of  simulating  a  probability  distribution  of  a  finite  set  of  outcomes,  Persi  Diaconis 
(personal  communication)  has  suggested  setting  up  a  Markov  chain  with  this  probability  distribu¬ 
tion  as  the  stationary  probability  distribution.  An  example  of  particular  interest  is  simulating  the 
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uniform  distribution  of  matrices  with  nonnegative  integer  entries  and  fixed  column  and  row  sums 
(contingency  tables).  Diaconis  and  Efron  (1986)  have  discussed  the  analysis  of  two-way  tables 
based  on  this  model. 

Diaconis  sets  up  the  following  procedure  to  generate  a  Markov  chain  in  which  all  possible  tables 
(with  assigned  marginals)  are  the  states  of  the  chain.  At  each  step  select  a  pair  of  rows  (g,h)  and 
columns  (i,  j)  at  random;  with  probability  |  add  1  to  the  g,  t'-th  cell  and  to  the  h,  j- th  cell  and 
subtract  1  from  the  g,  j- th  cell  and  from  the  h,  i-th  cell.  If  a  step  would  lead  to  a  negative  entry, 
cancel  it.  Each  step  leaves  the  row'  and  column  sums  as  given  and  so  determines  the  transition 
probabilities  of  a  Markov  chain  with  the  possible  tables  as  states.  Since  the  matrix  of  transition 
probabilities  is  doubly  stochastic  (the  probability  of  going  from  one  state  to  another  is  the  same  as 
the  probability  of  the  reverse),  the  stationary  probabilities  are  uniform;  that  is,  p  is  proportional 
to  e. 

Diaconis  and  Efron  w'ere  interested  in  the  distribution  of  the  \2  goodness-of-fit  statistic,  say 
T(z),  where  z  is  the  vector  representation  of  the  two-way  tables;  that  is,  they  want  Pr{T(Z)  <  r} 
for  arbitrary  r  £  [0,oo).  Given  p<  =  Pr {Z  =  e*}  the  probability  can  be  calculated 

m 

(47)  Pr  {T(Z)  <t}  =  £>/[T(£t-)  <  r], 

t=i 

where  /(•)  is  the  indicator  function;  that  is,  l[T{ei)  <  r]  =  1  if  T(£i)  <  r,  and  =  0  if  r(cj)  >  r. 
Given  a  sample  z\ , . . . ,  *jv,  the  probability  (47)  is  estimated  by 

(«)  jji;r[r(*,)<T]. 

(=1 

The  estimator  of  the  cdf  of  T(Z)  is  the  empirical  cdf  of  T(Z)  although  zi, . .  ,,zjv  are  not  inde¬ 
pendent.  However,  the  sample  variance  of  l[T(zt)  <  r]  divided  by  N  is  not  an  estimator  of  the 
variance  of  (48).  If  a  subset  of  zj,...,z/v  is  taken,  say,  zt, , . . .,  ztn,  so  that  min  |t,  —  tj\  is  great 
enough  that  zt,,. .  .,ztn  can  be  considered  independent,  the  sample  variance  of  I[T(ztl)  <  r]  pro¬ 
vides  an  estimator  of  a  lower  bound  to  the  variance  of  (48). 

If  the  specified  row  and  column  totals  are  such  that  each  possible  value  of  T  can  come  from 
only  one  table,  there  is  a  1  —  1  correspondence  between  the  statistic  and  the  table.  That  is,  the 
value  of  the  statistic  is  simply  another  label  for  the  state.  Then  the  generation  of  the  statistic 
is  given  by  the  Markov  chain,  and  its  empirical  cdf  is  an  asymptotically  efficient  estimator  of  the 
distribution  of  T.  These  facts  suggest  that  even  if  there  is  not  a  1  -  1  correspondence  between  the 
statistic  and  the  table,  the  empirical  cdf  is  an  asymptotically  efficient  estimator. 

6.  Scoring  a  Markov  chain. 

Suppose  each  state  is  assigned  a  numerical  value  Oj,  t  =  1,. .  .,m.  Let  yt  =  ot  if  and  only  if 
xt  is  in  state  i.  For  example,  the  state  of  a  queue  may  be  indicated  by  the  number  of  customers 
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waiting.  In  that  case  it  is  convenient  to  label  the  states  0,1,..., AT  and  set  a,  =  i;  here  N  is  the 
maximum  number  of  customers  who  can  be  waiting.  (In  many  other  cases  the  index  of  the  state  * 

may  have  no  numerical  meaning.) 

The  process  {yt}  can  also  be  defined  by 

(49)  yt  =  a'zt, 

where  a'  =  (oi, . .  .  ,am).  The  second-order  moments  of  {j/*}  can  be  found  from  those  of  { zt }.  The 
mean  is 

(50)  €yt  =  a'p  =  p; 
say;  the  variance  is 

(51)  Var(t/t)  =  a'Vat  =  a'Dpa  -  (a'p)2 

=  a'QT'Dpa  -  (a'p)2 

m 

=  ^(a'qiXa'Dpti)-, 
i=2 

and  the  covariances  are  given  by 

* 

(52)  Co v(yt,  yt~t)  =  a'  Cov(zt,  zt-,)a 

=  a'(P'yVa 

=  a'QiAffiDpa 

m 

=  £(«'*, -Xa'DptO  A*. 

i=2 

A  similar  result  was  derived  by  Reynolds  (1972)  in  a  different  way.*' 

The  nature  of  the  process  { yt }  can  be  found  from  the  nature  of  the  {zt}  process.  Let  C  be  the 
lag  operator;  that  is,  £zt  =  zt_ j.  Then  the  {zt}  process  can  be  written  in  autoregressive  form  as 

(53)  (I  —  P'C)zt  =  Vf 

Since  P  =  TAQ'  and  T'Q  =  I,  multiplication  of  (53)  on  the  left  by  T'  gives 

(54)  (I  -  AC)wt  =  ut, 
where 

(55)  ”,‘  =  I"I'  =  (r')s:'  =  U!») 

*  I  am  indebted  to  Jayaram  Muthuswamy  for  calling  my  attention  to  this  problem  and  associated 
literature. 
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and 


(56)  «<  =  rS  =  (‘,)r,=  (u°!>). 

We  can  re-write  the  last  m  -  1  components  of  (54)  as 

(57)  (I  -  A2C)w =  Uj2*. 

The  first  component  of  (54)  is  (1  -  £)1  =  0.  Then 

(58)  w[2)  =  (7-  A2C)~1u[2]. 

Multiplication  of  (54)  on  the  left  by  Q  yields 

(59)  yt  =  a'zt  =  <a'(p,  <?2)  ^(2)^  =  a'p  +  a'Q2(I  -  A2C)~lu(2] 

=  a'p  +  a'Q2{I  -  A2C.yxT'2vt. 


Multiplication  of  (59)  by  \I  —  ^12^1  yields 


(60) 


rid  -  a jc) 


3=2 


( vt - #*)  =  53(a'9j)  no- x3c)t'jvt‘ 

3=2  h*l,j 


This  equation  defines  an  autoregressive  moving  average  process  (in  the  wide  sense)  with  an  autore¬ 
gressive  part  of  order  m  —  1  and  a  moving  average  part  of  order  at  most  m  —  2.  Note  that  if  ay  =  1 
for  some  j  and  ay  =  0  for  i  ±  j,  than  (60)  defines  the  marginal  process  of  z,<. 

The  above  development  is  in  terms  of  real  roots  and  vectors.  If  a  pair  of  roots  are  complex 
conjugate,  the  corresponding  pairs  of  vectors  are  complex  conjugate  and  the  analysis  goes  through 
as  before. 

The  covariance  function  (52)  will  be  the  covariance  function  of  a  first-order  autoregression  if 
there  is  only  one  term  in  the  sum  on  the  right-hand  side.  Let  Q2  =  (q2,  Q3),  T2  =  {t2,T2),  and 


(61) 


0 

A$ 


) 


for  a  real  root  A2.  Then 


(62)  Cov(yt,  yt-„)  =  (a'q2)(a'Dpt2)\2  +  a'Q^A^Dpa. 

This  is  the  covariance  function  of  a  first-order  autoregressive  process  if  A2  is  real  and  either  a'Q3  = 
0  or  if  a' DpTz  =  0.  This  fact  was  given  by  Lai  (1978)  in  his  Theorem  2.3  under  the  assumption 
that  all  of  the  characteristic  roots  are  real  and  distinct. 
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The  alternative  conditions  may  be  written  as 


(63)  a'Q  =  {p,cc'q2,0) 
and 

(64)  ex' DpT  —  (/i,  ex'  Dpt2,0). 

Multiply  (63)  on  the  right  by  Q~l  =  T'  and  (64)  by  T~x D~ 1  =  Q' Dp 1  to  obtain 

(65)  ex'  =  ne'  4-  ( Q.'q2)t'2 
and 

(66)  a' =  pe' +  (cx'Dpti^Dp1 . 

The  conclusion  is  that  {j/t}  is  a  first-order  autoregressive  process  (in  the  wide  sense)  if  the  Markov 
chain  is  irreducible  and  aperiodic,  if  there  are  m  linearly  independent  characteristic  vectors,  and 
if  the  vector  a  is  a  linear  combination  of  e  and  of  either  a  right-sided  characteristic  vector  of 
P  corresponding  to  a  real  root  (other  than  1)  or  a  left-sided  vector  corresponding  to  a  real  root 
multiplied  by  Dp 1 . 
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